Experiments in Constructing a Corpus of Discourse Trees: Problems, Annotation Choices, Issues

نویسندگان

  • Daniel Marcu
  • Magdalena Romera
  • Estibaliz Amorrortu
چکیده

We present a tagging schema and a tagging tool for labeling the rhetorical structure of texts. We focus on presenting the difficulties that we faced in designing a discourse annotation manual and on discussing the choices that we made in order to address these difficulties. We report reliability results concerning our agreement on building the rhetorical structure of 90 texts of three genres: 30 news stories, 30 editorials, and 30 scientific articles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

1 Alter 2 Loosen 3 Change Sequence 1 Alter 2 Loosen 3 Change Sequence Means 2 Loosen 3 Change Means

We present discourse annotation work aimed at constructing a parallel corpus of Rhetorical Structure trees for a collection of Japanese texts and their corresponding English translations. We discuss implications of our empirical ndings for the task of text planning in the context of implementing multilingual natural language generation systems.

متن کامل

Semantic Annotation for Generation: Issues in annotating a corpus to develop and evaluate discourse entity realization algorithms

We are annotating a corpus with information relevant to discourse entity realization, and especially the information needed to decide which type of NP to use. The corpus is being used to study correlations between NP type and certain semantic or discourse features, to evaluate hand-coded algorithms, and to train statistical models. We report on the development of our annotation scheme, the prob...

متن کامل

Constructing an Annotated Story Corpus: Some Observations and Issues

This paper discusses our ongoing work on constructing an annotated corpus of children’s stories for further studies on the linguistic, computational, and cognitive aspects of story structure and understanding. Given its semantic nature and the need for extensive common sense and world knowledge, story understanding has been a notoriously difficult topic in natural language processing. In partic...

متن کامل

Examining the Effect of Ideology and Idiosyncrasy on Lexical Choices in Translation Studies within the CDA Framework

Using a critical discourse analytic model of translation criticism, the present study attempts to explore the effect of ideology and idiosyncrasy on the lexical choices in translation studies. The study employed a descriptive approach to answer two research questions: Is there any relationship between ideology and idiosyncratic features of translators' lexical choices? And if yes, can it be ana...

متن کامل

Building a Discourse-Annotated Dutch Text Corpus

We are compiling a corpus of Dutch texts annotated with discourse structure and lexical cohesion, containing initially 80 texts from expository and persuasive genres. We are using this resource for corpus-based studies of discourse relations, discourse markers, cohesion, and genre differences. We are also exploring the possibilities of automatic text segmentation and semi-automatic discourse an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999